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Shiftable  Multiscale  Transforms 

Eero  P.  Simoncelli,  William  T.  Freeman,  Edward  H.  Adelson,  and  David  J.  Heeger 


Abstract— Orthogonal  wavelet  transforms  have  recently  be¬ 
come  a  popular  representation  for  multiscale  signal  and  image 
analysis.  One  of  the  major  drawbacks  of  these  representations  is 
their  lack  of  translation  invariance:  the  content  of  wavelet 
subbands  is  unstable  under  translations  of  the  input  signal. 
Wavelet  transforms  are  also  unstable  with  respect  to  dilations  of 
the  input  signal,  and  in  two  dimensions,  rotations  of  the  input 
signal.  We  formalize  these  problems  by  defining  a  type  of 
translation  invariance  that  we  call  “shiftability”.  In  the  spatial 
domain,  shiftability  corresponds  to  a  lack  of  aliasing;  thus,  the 
conditions  under  which  the  property  holds  are  specified  by  the 
sampling  theorem.  Shiftability  may  also  be  considered  in  the 
context  of  other  domains,  particularly  orientation  and  scale. 
“Jointly  shiftable’’  transforms  that  are  simultaneously  shiftable 
in  more  than  one  domain  are  explored.  Two  examples  of  jointly 
shiftable  transforms  are  designed  and  implemented:  a  one-di¬ 
mensional  transform  that  is  jointly  shiftable  in  position  and 
scale,  and  a  two-dimensional  transform  that  is  jointly  shiftable 
in  position  and  orientation.  The  usefulness  of  these  image 
representations  for  scale-space  analysis,  stereo  disparity  mea¬ 
surement,  and  image  enhancement  is  demonstrated. 

Index  Terms— Wavelet,  multiscale,  pyramid,  aliasing,  sampling, 
interpolation,  orientation,  steerable  filters,  image  representa¬ 
tion,  image  processing. 


I.  Introduction 

IN  many  signal  processing  applications,  a  signal  is  decom¬ 
posed  into  a  set  of  subbands,  and  the  information  within 
each  subband  is  processed  more  or  less  independently  of  that 
in  the  other  subbands.  In  classical  signal  processing,  the 
subbands  are  each  sampled  at  a  rate  above  the  Nyquist  limit 
in  order  to  avoid  aliasing.  If  the  subband  spectra  overlap, 
then  the  transform  will  be  oversampled,  and  thus,  overcom¬ 
plete. 

Recently  it  has  become  popular  to  use  discrete  subband 
decompositions  that  are  critically  sampled,  i.e.,  in  which  the 
number  of  samples  in  the  representation  is  equal  to  the 
number  in  the  signal.  Quadrature  mirror  filters  (QMF’s)  and 
wavelets,  which  are  closely  related,  allow  one  to  violate  the 
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Nyquist  criterion  without  discarding  information;  indeed  these 
subband  decompositions  constitute  orthogonal  subband  trans¬ 
forms.  They  achieve  this  feat  by  ensuring  that  the  aliasing 
errors  from  all  of  the  subbands  cancel  when  the  bands  are 
recombined. 

These  transforms  have  proven  to  be  highly  successful  for 
efficient  image  coding  [45],  [50],  [17],  [3],  [43],  [2],  [49] 
and  they  have  been  proposed  as  useful  frameworks  for  many 
other  signal  processing  tasks.  One  of  the  appealing  properties 
of  wavelet  transforms  is  that  they  represent  the  signal  with  a 
set  of  basis  functions  that  are  related  by  translation  and 
dilation.  These  operators  correspond  to  common  “physical” 
transformations  of  signals.  Since  the  basis  functions  have  this 
simple  relationship,  one  might  hope  that  the  transform  coef¬ 
ficients  would  behave  in  a  simple  manner  when  the  input 
signal  is  translated  or  dilated.  However,  the  situation  turns 
out  to  be  rather  complex. 

To  illustrate  this  point,  we  consider  the  case  of  spatial 
translation.  We  take  as  our  transform  a  three-level  pyramid 
constructed  using  the  discrete  dyadic  Daubechies  wavelet 
kernel  [9]  of  length  four.  The  input  signal  is  shown  at  the  top 
of  Fig.  1(a).  For  illustrative  purposes,  we  have  chosen  this 
signal  to  be  equal  to  one  of  the  wavelet  basis  functions  in  the 
second  subband.  Thus,  its  transform  is  a  single  impulse 
within  that  subband.  All  of  the  other  coefficients  in  the 
transform  are  zero.  The  coefficients  of  the  three  subbands  are 
plotted  in  Fig.  1(b)— (d).  Now  suppose  we  translate  the  input 
signal  one  sample  to  the  right,  as  shown  in  Fig.  1(e).  The 
coefficients  of  the  transform,  plotted  in  Fig.  1  (f) — (h) .  change 
dramatically.  Coefficient  power  is  now  spread  broadly  within 
and  across  the  subbands.  Thus,  the  representation  is  highly 
dependent  on  the  relative  alignment  of  the  input  signal  with 
the  subsampling  lattices.  This  is  hardly  the  behavior  that  one 
would  like  to  see  when  attempting  to  do  translation-invariant 
signal  processing. 

The  fact  that  the  transform  contents  jump  around  in  our 
example  leads  us  to  ask  what  we  can  hope  to  achieve  in  the 
way  of  translation  invariance.  We  cannot  literally  expect 
translation  invariance  in  a  system  based  on  convolution  and 
subsampling:  translation  of  the  input  signal  cannot  produce 
simple  translations  of  the  transform  coefficients,  unless  the 
translation  is  a  multiple  of  each  of  the  subsampling  factors  in 
the  system.  It  is,  however,  possible  to  achieve  a  weaker  form 
of  translation  invariance  in  the  sense  that  all  the  information 
represented  within  a  subband  remains  in  that  subband  as  the 
input  signal  is  translated.  A  necessary  and  sufficient  condition 
for  this  is  just  the  Nyquist  criterion.  Since  critically  sampled 
subband  transforms  (such  as  wavelet  transforms)  typically 
violate  the  Nyquist  criterion,  the  information  moves  from  one 
band  to  another  under  translation. 
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Fig.  1.  Effect  of  translation  on  the  wavelet  representation  of  a  signal,  (a) 
Input  signal,  which  is  equal  to  one  of  the  wavelet  basis  functions,  (b)-(d) 
Decomposition  of  the  signal  into  three  wavelet  subbands.  Plotted  are  the 
coefficients  of  each  subband.  Dots  correspond  to  zero- value  coefficients,  (e) 
Same  input  signal,  translated  one  sample  to  to  the  right,  (f)-(h)  Decomposi¬ 
tion  of  the  shifted  signal  into  three  wavelet  subbands.  Note  the  drastic 
change  in  the  coefficients  of  the  transform,  both  within  and  between  sub¬ 
bands. 


These  issues  have  been  raised  elsewhere.  Strang  mentions 
the  translation  invariance  problem  as  one  of  the  primary 
weaknesses  of  the  wavelet  transform  [42].  In  the  context  of 
the  stereo  matching  problem,  Mallat  addressed  this  problem 
by  building  a  translation-invariant  representation  from  the 
zero-crossings  of  the  full-density  wavelet  subbands  [30] . 

The  translation  invariance  issue  serves  as  a  starting  point 
for  our  discussion,  but  one  of  the  purposes  of  this  paper  is  to 
show  that  this  problem  may  also  be  considered  in  other 
domains.  In  two  dimensions,  wavelet  transforms  may  be 
computed  using  basis  functions  at  a  particular  set  of  posi¬ 
tions,  scales  and  orientations.  The  same  problems  of  invari¬ 
ance  can  be  stated  for  the  parameters  of  scale  and  orientation. 
The  sampling  of  these  domains  is  analogous  to  the  sampling 
in  the  spatial  domain,  and  as  with  translations,  two-dimen¬ 
sional  wavelet  transforms  typically  do  not  behave  well  under 
rotations  or  dilations  of  the  input  signal. 

The  desire  for  invariance  with  respect  to  these  simple 
operations  leads  us  to  reexamine  the  critical  sampling  con¬ 
straint  of  the  wavelet  representation.  More  specifically,  it 
may  seem  that  the  use  of  ideal  (sine)  kernels  could  solve  the 
translation  invariance  problem  illustrated  in  Fig.  1.  But  we 
will  show  that  a  critically-sampled  transform  based  on  these 


kernels  lacks  scale-invariance:  the  transform  does  not  behave 
well  under  dilations  of  the  input  signal. 

In  this  paper,  we  formalize  this  line  of  thought  by  defining 
the  property  of  “shiftability.”  We  first  discuss  this  property 
with  respect  to  individual  parameters:  spatial  position,  orien¬ 
tation,  and  scale.  We  then  discuss  transformations  that  are 
simultaneously  shiftable  with  respect  to  subsets  of  these 
parameters.  In  order  to  achieve  shiftability,  we  show  that  we 
must  relax  the  critical  sampling  condition  of  the  wavelet 
transforms,  thus  abandoning  the  elegance  of  orthogonality. 
Finally,  we  implement  two  example  transforms  to  illustrate 
these  properties  and  we  apply  them  to  several  signal  and 
image  processing  problems. 

II.  Background 

In  this  section,  we  define  some  terminology  and  review 
literature  on  linear  transforms  and  representations  related  to 
our  topic. 

A  complete  linear  transform  represents  a  signal  as  a 
weighted  sum  of  basis  functions.  That  is,  a  signal,  fix),  is 
represented  as  a  sum  over  an  indexed  collection  of  functions, 
gi(X)- 

f{x)  =  £  T/g,-(jf),  (1) 

i 

where  the  yt  are  the  transform  coefficients.  These  coefficients 
are  computed  from  the  signal  by  projecting  onto  a  set  of 
functions  that  we  call  projection  functions,  hj  x): 

yf=  J  dxh;(x)f(x).  (2) 

Equation  (2)  is  used  to  compute  the  transform  of  the  signal. 
The  original  signal  may  be  recovered  from  the  coefficients, 
yh  and  the  basis  functions,  gjx),  using  (1). 

In  systems  based  on  convolution  operations,  the  projection 
functions  are  shifted  copies  of  the  (reversed)  convolution 
kernels.  From  the  definition  of  convolution,  the  projection 
functions  corresponding  to  a  single  convolution  and  sampling 
stage  are 

M*)  =  M'Av  -  x). 

In  this  case,  the  subscript  i  indexes  the  spatial  position  of  its 
associated  basis  function,  and  Ax  is  the  sample  spacing. 

The  basis  functions,  h,(  x),  are  said  to  be  linearly  inde¬ 
pendent  if  there  is  no  linear  combination  of  them  that  is  zero 
for  all  x.  In  discrete  systems,  a  complete  transform  with 
linearly  independent  basis  functions  is  called  critically  sam¬ 
pled:  the  coefficient  output  rate  is  equal  to  the  input  signal 
sample  rate.  If  the  set  of  basis  functions  is  not  linearly 
independent,  then  the  transform  is  said  to  be  overcompiete. 
In  this  case,  the  corresponding  projection  functions  are  not 
unique.  We  will  refer  to  a  transform  for  which  hfx)  =  g,(x) 
as  self -inverting .  If  the  transform  is  self-inverting  and  the 
basis  functions  are  linearly  independent,  then  the  transform  is 
orthonormal  (i.e.,  orthogonal  and  normalized).  One  can. 
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however,  construct  transforms  that  are  self-inverting  but  not 
orthonormal,  as  we  will  discuss  later. 

Often,  one  is  concerned  with  the  properties  of  the  individ¬ 
ual  basis  and/or  projection  functions.  In  particular,  signal 
analysis  problems  often  require  the  use  of  functions  that  are 
spatially  localized  or  “tuned.”  The  property  of  localization 
implies  a  localization  measure:  for  example,  a  function  that  is 
localized  in  space  might  have  a  restricted  region  of  support, 
or  a  finite  variance  (second  moment).  Similarly,  one  can 
consider  functions  that  are  localized  in  the  Fourier  domain.  A 
function  that  has  a  localized  Fourier  spectrum  is  tuned  for  a 
particular  range  of  scales.  Shifting  this  spectrum  along  the 
log  Fourier  axis  corresponds  to  a  dilation  of  the  function  in 
the  spatial  domain. 

In  two  dimensions,  the  Fourier  domain  may  be  parameter¬ 
ized  in  terms  of  polar  coordinates.  Intuitively,  this  is  a  more 
natural  coordinate  system  than  the  Cartesian  system:  the 
(logarithmic)  radial  component  corresponds  to  scale,  and  the 
angular  coordinate  corresponds  to  orientation.  A  two-dimen¬ 
sional  function  whose  Fourier  spectrum  is  localized  in  these 
parameters  is  tuned  for  scale  and  orientation,  and  translation 
in  this  two-dimensional  (log  polar)  Fourier  domain  corre¬ 
sponds  to  dilation  and  rotation  of  the  function  in  the  spatial 
domain. 

A  function  that  is  simultaneously  tuned  for  several  parame¬ 
ters  is  said  to  be  jointly  localized  in  those  parameters.  The 
concept  of  joint  localization  in  space  and  spatial  frequency 
was  introduced  by  Gabor  in  1946  [16].  He  derived  a  lower 
bound  for  a  particular  measure  of  joint  localization,  and 
determined  the  unique  set  of  functions  (products  of  sinusoids 
and  Gaussians)  that  achieve  this  limit.1  More  recently,  Daug- 
man  extended  the  Gabor  basis  set  to  two  dimensions,  and 
emphasized  the  importance  of  analyzing  orientation  in  images 
[11]. 

Many  authors  have  worked  on  the  analysis  of  orientation 
(e.g.,  Knutsson  and  Granlund  [22]).  Freeman  and  Adelson 
[14],  [15]  developed  transforms  for  analysis  of  oriented 
structure  in  images.  They  developed  a  technique  to  synthe¬ 
size  kernels  of  arbitrary  orientation  from  linear  combinations 
of  a  fixed  set  of  oriented  basis  kernels,  and  they  called  these 
basis  sets  “steerable”  filters.  They  describe  the  conditions 
under  which  a  given  set  of  rotated  basis  functions  is  steer¬ 
able.  Perona  [35]  has  described  an  alternative  formulation  of 
the  steerability  property  and  recently  extended  this  to  the 
analysis  of  scale. 

Multiscale  analysis  has  been  another  important  topic  in 
signal  processing.  The  basis  functions  of  Gabor’s  original 
transform  were  chosen  to  cover  equal- width  frequency  bands. 
These  functions  are  therefore  related  by  modulation  in  the 
frequency  domain.  In  order  to  create  representations  more 
suited  for  analysis  of  scale,  many  authors  in  signal  processing 
and  computer  vision  as  well  as  biological  vision  have  advo¬ 
cated  transforms  with  constant  logarithmic-width  (constant 
Q)  frequency  bands  [19],  [31],  [48],  [24],  [1],  [34],  [28], 

1  Lemer  [25]  later  noted  that  the  choice  of  a  different  localization  measure 
could  result  in  a  different  class  of  ‘'optimal"  function,  casting  some  doubt 
on  the  unique  importance  of  Gabor’s  class  of  functions. 
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The  basis  functions  of  such  a  transform  are  related  by 
dilation  operations.  Burt  and  Adelson  [5]  developed  an  effi¬ 
cient  octave-width  subband  transform  called  the  “Laplacian 
pyramid”  and  used  it  for  image  coding  and  other  image 
processing  tasks.  An  advantage  of  their  technique  is  the 
computational  efficiency  of  the  recursive  pyramid  structure. 

In  an  independent  context,  mathematicians  developed  a 
form  of  continuous  function  representation  called  wavelets 
[20],  [9],  [10],  The  defining  characteristic  of  a  wavelet 
transform  is  that  the  basis  functions  are  translations  and 
dilations  of  a  common  kernel  (see  [42]  for  an  introduction). 
Thus,  the  wavelet  transform  performs  a  scale  decomposition 
similar  to  the  Laplacian  pyramid.  Furthermore,  it  may  be 
implemented  in  a  recursive  manner  similar  to  the  Laplacian 
pyramid.  Finally,  the  term  wavelet  is  usually  assumed  to 
refer  to  an  orthonormal  basis  set. 

Prior  to  the  development  of  wavelets,  octave-width  sub¬ 
band  transforms  based  on  quadrature  mirror  filters  (QMF’s) 
were  developed  in  the  signal  processing  community  [7],  [13], 
[45],  [44],  [46],  [39],  [29].  In  the  discrete  domain,  pyramids 
constructed  from  QMF’s  may  be  considered  as  wavelet  trans¬ 
forms,  although  they  are  often  only  approximately  orthonor¬ 
mal.  They  have  been  used  for  various  tasks  requiring  multi¬ 
scale  signal  analysis,  and  have  been  found  especially  effective 
for  data  compression.  In  two  dimensions,  separable  imple- 
mentatons  have  been  used  for  image  compression  [50],  [17], 
[3],  [43].  QMF’s  with  oriented  basis  functions  that  are 
related  by  translations,  dilations  and  rotations  have  been 
developed  on  hexagonal  sampling  lattices  [3],  [40],  Several 
other  authors  have  developed  non-orthogonal  oriented  multi¬ 
scale  transforms  in  which  the  basis  functions  are  translations 
and  rotations  of  a  common  function  [12],  [19],  [32],  [47]. 

As  mentioned  in  the  introduction,  one  major  drawback  of 
using  wavelets  (or  QMF’s)  for  signal  processing  applications 
is  their  lack  of  translation  invariance  [29] ,  [42] .  In  the  next 
section,  we  address  this  problem  by  defining  shiftability ,  a 
generalization  of  both  the  sampling  theorem  and  the  steerabil¬ 
ity  property  introduced  by  Freeman  and  Adelson  [14],  [15], 
We  discuss  this  property  in  the  context  of  different  image 
transform  parameters:  position,  scale,  and  orientation.  To 
achieve  shiftability,  we  show  that  one  must  relax  the  orthogo¬ 
nality  property  of  the  wavelet  transform.  We  do  not,  how¬ 
ever,  need  to  abandon  the  property  of  self-inversion. 

The  concept  of  joint  localization  introduced  by  Gabor 
motivates  the  decomposition  of  images  in  terms  of  spatial 
position,  scale,  and  orientation.  However,  this  does  not 
guarantee  that  these  parameters  are  easily  accessible.  The 
concept  of  shiftability  that  we  introduce  here  emphasizes  the 
sampling  and  interpolation  of  this  parameter  space,  and  the 
ability  to  independently  access  these  parameters. 

III.  Shiftability 

In  this  section,  we  define  the  property  of  transform  shifta¬ 
bility  and  discuss  its  application  to  the  domains  of  spatial 
position,  orientation,  and  scale. 

Suppose  that  the  basis  functions  of  a  two-dimensional 
transform  are  translations,  dilations,  and  rotations  of  a  com- 
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mon  function  (kernel).  Suppose  also  that  this  kernel  is  simul¬ 
taneously  localized  in  spatial  position,  scale  and  orientation. 
Then  each  basis  function  may  be  associated  with  a  point  in 
the  space  of  parameters,  (x,  y,  r,  8),  where  x  and  y  corre¬ 
spond  to  the  spatial  location  and  r  and  9  correspond  to  scale 
and  orientation  (the  log  polar  coordinates  of  the  Fourier 
domain).  Given  the  transform  coefficients  associated  with 
these  basis  functions,  we  want  to  be  able  to  compute  coeffi¬ 
cient  values  corresponding  to  any  point  in  this  parameter 
space.  Although  the  four  parameters  may  assume  continuous 
values,  we  do  not  want  an  infinite  set  of  basis  functions,  each 
tuned  for  slightly  different  parameter  values.  Instead,  we 
want  to  interpolate  responses  corresponding  to  any  point  in 
parameter  space  from  the  coefficients  corresponding  to  a 
finite  set  of  samples. 

Moreover,  we  would  like  to  continuously  vary  a  single 
parameter  while  leaving  the  others  fixed.  For  example,  for  a 
fixed  subband  (i.e. ,  fixed  orientation  and  scale),  one  should 
be  able  to  interpolate  subband  responses  at  any  spatial  posi¬ 
tion  from  the  coefficients  in  that  subband.  At  a  fixed  location 
and  scale,  one  should  be  able  to  interpolate  responses  at  any 
orientation  given  the  set  of  coefficients  corresponding  to  the 
oriented  basis  functions  at  that  location  and  scale.  And  for 
fixed  location  and  orientation,  one  should  be  able  to  compute 
responses  at  any  scale,  given  the  coefficients  of  basis  func¬ 
tions  of  different  scales  at  that  location  and  orientation. 

Note  that  interpolation  is  always  possible  when  the  trans¬ 
form  is  invertible:  one  can  simply  invert  the  transform  and 
then  compute  the  inner  product  of  the  signal  with  the  appro¬ 
priate  projection  function.  For  example,  one  could  interpolate 
between  samples  in  one  subband  of  a  wavelet  transform  by 
reconstructing  the  original  signal,  and  then  applying  the 
appropriate  projection  function  at  the  appropriate  position. 
But  one  cannot  in  general  interpolate  the  highpass  response 
using  only  the  highpass  coefficients,  due  to  the  aliasing  that 
occurs  during  the  subsampling  operation. 

A.  Shi f t ability  in  Position 

To  define  shiftability  more  concretely,  we  discuss  it  in  a 
context  that  is  familiar  to  most  readers.  Suppose  one  wishes 
to  analyze  a  one-dimensional  signal  f{x)  by  convolving  it 
with  a  linear  kernel  h(x).  The  convolution  output  is  to  be 
sampled:  that  is,  the  convolution  output  is  not  to  be  retained 
at  every  position.  We  do  not,  however,  want  the  sampling 
operation  to  affect  the  representation  (i.e.,  the  points  at 
sampled  locations  should  not  be  treated  differently  than  other 
locations  in  the  original  signal).  Specifically,  one  should  be 
able  to  compute  (interpolate)  the  full  convolution  output  from 
the  set  of  retained  samples.  It  is  well  known  that  the  lower 
bound  on  the  sampling  rate  that  meets  this  condition  is  the 
Nyquist  rate:  the  frequency  bandwidth  of  the  kernel  h(x) 
[33],  Thus,  in  the  spatial  domain,  shiftability  of  a  sampled 
representation  corresponds  to  the  lack  of  aliasing. 

We  derive  an  alternative  form  of  the  sampling  theorem  that 
makes  explicit  a  set  of  analytic  interpolation  functions  that 
can  be  used  to  perform  the  shifting  operation.  The  following 
discussion  is  restricted  to  continuous  input  signals,  f(x),  that 
can  be  expanded  in  a  Fourier  series  F(k)  (i.e.,  periodic  or 


compactly  supported  signals);  the  result  is  easily  extended  to 
discretely  sampled  signals.  For  notational  simplicity,  assume 
that  the  signal  period  is  2  7r.  A  set  of  N  transform  coeffi¬ 
cients,  y[n],  are  computed  by  convolving  with  a  kernel  h(  x) 
and  uniformly  sampling  the  output2  with  sample  spacing 
Ax  =  2ir  /N: 

y[n\  =  [  dxh(nAx  -  x)f(x), 

J  0 

«e{0,l,---,Af-  1}.  (3) 

This  is  illustrated  in  the  block  diagram  in  Fig.  2.  In  this 
linear  system,  the  projection  functions  correspond  to  a  set  of 
shifted  copies  of  the  kernel:  {h(nAx  -  x)  |  n  =  0,1,  •  •  •, 
N  -  1}. 

The  transformation  described  in  (3)  is  shiftable  if  there 
exists  a  set  of  interpolation  functions,  bn(  xa),  that  can  be 
used  to  compute  the  inner  product  of  the  signal  with  an 
arbitrarily  shifted  version  of  the  kernel  h(x)  as  a  weighted 
combination  of  the  y[n}\ 

[  dxh{xa  -  x)f(x)  =  £  b„{x0)y\n],  (4) 

'  0  «  =  0 

where  xa  is  the  arbitrary  shift  distance.  The  following 
proposition  (adapted  from  [15])  describes  a  condition  under 
which  the  transformation  is  shiftable,  and  gives  an  analytic 
expression  for  the  interpolation  functions. 

Proposition  1:  The  transformation  defined  by  equation  (3) 
is  shiftable,  if  and  only  if  there  exist  a  set  of  interpolation 
functions,  bn(xa),  that  satisfy  the  matrix  equation: 

j  eJxoka  \ 
eix0k i 

gjxokM-  i  j 

I  j  eJA*k ® 

1  eJA*k' 

1  1 

&o(*0)  \ 

*i(-0 

1  fyv-  l(*o)  I 

2  The  following  derivation  does  not  rely  on  the  uniformity  of  the  sam¬ 
pling,  but  the  translation  invariance  of  the  coefficient  power  (discussed  in  the 
next  section)  only  holds  in  the  case  of  uniform  sampling. 


ej2Axk0  ej(N-l)Axk0  \ 

ej2Axkl  .  .  .  gJW-  UA,*, 

eJ2AxkM_,  ej(N-l)AxkM_,  J 

forallx0,  (5) 
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f(x)  1 

h(x) 

y[n] 

A,=  2rt/N 

Fig.  2.  Block  diagram  illustrating  the  filtering  and  sampling  operation. 


where  {k0,  k kM_x]  is  the  set  of  frequencies  for 
which  H(k),  the  Fourier  series  of  h(x),  is  nonzero. 

Proof:  We  start  with  the  definition  of  shiftability  given 
in  (4): 


f  dxh(xa  -  x)f(x) 

Jo 

=  E  b„(x0)y[n\ 

n  =  0 

=  E  b„(x0)  [  dxf(x)h(nAx  -  x) 

n  =  0  ^  0 


'0 
N—  1 


/  dxf(x)  E  b„(xa)h(nAx  -  x). 

J  0  n  —  0 


In  order  for  this  expression  to  hold  for  any  signal  fix),  it 
must  be  true  that 


h(x0-x)=  E  bn{x0)h{nAx-  x). 


(6) 


That  is,  we  can  write  the  arbitrarily  shifted  kernel  h(x  —  xa) 
as  a  linear  combination  of  the  basis  functions  h(x  -  nAx) 
that  are  positioned  at  intervals  of  the  sample  spacing,  Ax. 
This  means  that  the  sampled  basis  set  spans  the  entire  sub¬ 
space  of  all  translations  of  the  kernel.  Equivalently,  the  linear 
subspace  spanned  by  the  sampled  basis  set  is  invariant  to 
translation. 

Since  we  are  assuming  f{x)  is  periodic,  we  may  assume, 
without  loss  of  generality,  that  h(x)  is  also  periodic.  Let 
H(k)  be  the  (complex)  Fourier  series  of  h(x).  Then  flipping 
the  sign  of  x  and  computing  the  Fourier  series  of  both  sides 
of  (6)  produces 

N-  1 

H{ k) eikx°  =  H(k)  Y,  bn{x0)eJkn*x, 

n-0 


for  all  x„ ,  k .  (7) 

This  is  a  set  of  linear  constraints  on  the  interpolation  func¬ 
tions,  b„(x0),  describing  the  phase-shifting  of  the  individual 
Fourier  components  of  h(  x)  necessary  to  translate  the  entire 
function.  If  a  solution  to  this  equation  exists,  then  the  repre¬ 
sentation  is  shiftable. 

Clearly,  (7)  imposes  a  constraint  on  the  interpolation  func¬ 
tions  only  for  those  frequencies  k  where  H(k)  is  nonzero.  If 
H(k)  is  nonzero  for  a  finite  set  of  frequencies  (or  if  fix) 
and  h{x)  are  discrete  signals),  this  constraint  may  be  written 
using  matrix  notation.  Let  { kQ,  kx,-  •  • ,  k M  _ ,  J  be  the  set  of 
frequencies  for  which  H(k)  is  nonzero.  Then  (7)  may  be 
written  as  the  matrix  equation  (5)  given  in  the  proposition. 


When  the  matrix  in  the  equation  is  invertible,  we  can  solve 
for  the  interpolation  functions  bn(x0),  and  the  representation 
will  therefore  be  shiftable.  □ 

The  relationship  of  the  constraint  in  (5)  to  the  Nyquist 
criterion  should  be  clear.  Invertibility  of  the  matrix  places  a 
lower  bound  on  the  number  of  samples,  N,  required  for 
shiftability.  In  particular,  the  number  of  samples  must  be 
greater  than  or  equal  to  the  number  of  nonzero  Fourier 
components  of  the  function  h(x),  as  discussed  in  [15].  For 
example,  consider  one  stage  of  a  discrete  dyadic  wavelet 
transform.  If  the  original  signal  is  of  length  (period)  N 
samples,  the  highpass  subband  computed  from  this  signal  will 
contain  N  / 2  samples.  The  kernels  are  designed  to  satisfy  the 
constraint: 


\H{k)\2  + 


H\k  + 


N 


=  2, 


for  each  k. 


Since  the  kernels  are  typically  not  ideal  (sine)  functions,  they 
must  contain  nonzero  frequency  components  corresponding 
to  more  than  7V/2  frequencies,  and  therefore  the  matrix  in 
equation  (5)  will  not  be  invertible. 

It  is  important  to  realize  that  except  for  the  specification  of 
which  Fourier  coefficients  are  nonzero,  the  interpolation 
functions  are  independent  of  the  kernel  h(  x).  This  means 
that  if  two  kernels  have  the  same  set  of  nonzero  Fourier 
components,  they  share  a  common  set  of  interpolation  func¬ 
tions. 

Note  that  if  the  matrix  in  (5)  is  not  invertible,  we  may  still 
find  a  least-squares  solution  using  the  matrix  pseudo-inverse. 
On  the  other  hand,  if  the  representation  is  overcomplete,  then 
the  interpolation  functions  will  not  be  unique.  In  this  case, 
we  can  again  solve  for  the  bn(x0)  using  the  pseudo-inverse 
(thus,  producing  a  minimum-norm  set  of  interpolators).  Al¬ 
ternatively,  one  may  prefer  to  use  a  set  of  interpolators  with 
another  property  such  as  minimal  region  of  support. 

B.  Shift-invariant  Response  Power 

In  an  aliased  transform,  the  response  power  depends  on  the 
signal  position:  translation  of  the  input  signal  generally  re¬ 
sults  in  a  redistribution  of  the  power  content  amongst  the 
various  frequency  subbands,  even  though  there  has  been  no 
change  in  the  frequency  spectrum  of  the  input.  This  is 
illustrated  in  Fig.  3.  We  show  a  fractal  signal,  and  the  power 
(sum  of  square  coefficients)  in  the  third  level  subband  of  a 
discrete  dyadic  wavelet  expansion  as  a  function  of  signal 
position.  As  the  signal  is  (circularly)  shifted,  the  power  in  the 
subband  oscillates  over  a  wide  range  of  values. 

The  shiftability  constraint  is  equivalent  to  the  constraint 
that  the  power  of  the  transform  coefficients  in  the  subband  is 
preserved  when  the  input  signal  is  shifted  in  position.  We 
show  this  in  the  following  proposition. 

Proposition  2:  Given  a  set  of  transform  coefficients  com¬ 
puted  using  (3),  the  transform  is  shiftable,  if  and  only  if  the 
power  of  the  coefficients,  EEolTlw]|2,  is  invariant  to 
translations  of  the  input  signal  f(x),  for  any  choice  of  input 
signal. 


592 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  38,  NO.  2,  MARCH  1992 


(b) 


Fig.  3.  Illustration  of  the  dependency  of  subband  power  on  signal  position, 
(a)  Discrete  fractal  signal,  of  length  64.  We  constructed  a  three-level  wavelet 
pyramid  decomposition  of  this  signal,  using  the  four-tap  Daubechies  wavelet. 
This  was  done  for  64  (circularly)  translated  copies  of  the  input  signal,  (b) 
Plot  of  the  power  in  the  third  wavelet  subband  as  a  function  of  translation 
distance  of  the  input  signal.  Note  that  the  power  varies  substantially  as  the 
input  is  shifted,  although  translation  produces  no  change  in  the  spectral 
power  of  the  signal . 

Proof:  We  show  this  result  by  considering  the  discrete 
Fourier  transform  (DFT),  Y( k),  of  the  output  of  the  diagram 
in  Fig.  2.  Using  standard  facts  about  sampling,  we  may  write 

Y(k)  =  Y.H{k  +  IN)  ■  F(k  +  IN),  (8) 
/ 

where  F(k)  is  the  Fourier  series  of  the  input  signal.  Hi k)  is 
the  Fourier  series  of  the  convolution  kernel,  and  the  subscript 
I  ranges  over  the  set  of  integers.  The  summation  corresponds 
to  the  spectrum  replication  that  occurs  when  the  signal  is 
sampled. 

Assume  that  the  transform  is  shiftable.  Using  Parseval’s 
theorem,  we  write  the  power  of  the  output  signal  in  terms  of 
its  DFT: 

N-  1  ,  tV-  1 

£  |tM|2=  E  T(*)| 

n  —  0  Ar  =  0 

N-\  ,  .2 

=  E  \  Y.H{k  +  IN)  •F(k  +  IN)\ 

fc  =  0  I  1  I 

N-\ 

=  £  J2\H(k  +  IN)  ■  F(k  +  IN)\ 

k= 0  / 

N-  1 

+  £  £  H{k  +  IN)  -F(k  +  IN) 

k  =  0  l^m 

■  H*(k  +  mN)  ■  F*{k  +  mN).  (9) 


where  the  (•)*  operator  is  complex  conjugation,  and  the 
subscripts  /  and  m  range  over  all  integers.  Since  we  are 
assuming  shiftability  (i.e. ,  no  aliasing),  the  second  term  is 
zero.  The  remaining  expression  is  invariant  to  translations  of 
the  input  signal: 

e'ItMI2  =  e‘  £  \H(k  +  iN)\2\F{k  +  «V)|2. 

n= 0  k= 0  / 

(10) 

That  is,  replacing  F(k)  by  eJkx°  F(k)  for  any  xa  will  not 
alter  the  right  hand  side  of  expression  in  (10):  the  response 
power  is  translation  invariant. 

The  converse  is  also  true.  Assume  the  system  is  not 
shiftable:  then  the  sampling  causes  aliasing.  Then  a  sinu¬ 
soidal  input  signal  at  a  frequency  that  is  aliased  will  appear  in 
the  second  term  of  (9)  (i.e.,  there  will  be  some  k,  l,  and  m 
for  which  the  product  in  the  second  term  is  nonzero).  As  the 
input  signal  is  translated,  this  nonzero  second  term  will  be 
modulated  by  a  phase  factor.  □ 

In  situations  where  it  is  impossible  or  impractical  to  achieve 
shiftability  with  respect  to  a  parameter,  there  is  a  more  easily 
satisfied  constraint  that  may  be  used  in  its  place.  We  call  this 
property  “flat  basis  power”  with  respect  to  the  parameter.  In 
the  case  of  the  spatial  position  parameter,  this  means  that  the 
sum  of  the  power  of  the  basis  functions  for  one  subband  is 
constant  as  a  function  of  position.  This  property  is  a  conse¬ 
quence  of  the  shiftability  property,  but  the  converse  is  not 
true.  We  will  use  this  constraint  in  the  design  of  a  two- 
dimensional  transform  in  Section  V.  Here,  we  define  the 
property  and  show  that  it  is  a  direct  consequence  of  shiftabil¬ 
ity. 

Corollary  1;  If  the  transform  specified  by  (3)  is  shiftable, 
the  sum  of  squares  of  the  basis  functions,  Y.„h2n(x),  is 
constant  for  all  x. 

Proof:  We  show  this  result  by  considering  the  impulse 
input  signal  ( F(k )  =  e7“Xo).  Then  y[ n\  will  be  the  value  of 
the  sample  of  the  nth  basis  function  at  the  location  of  the 
impulse.  In  this  case,  the  left-hand  side  of  (10)  is  equal  to  the 
sum  of  squares  of  a  subset  of  the  kernel  samples  spaced  by 
the  sampling  distance: 

£  |.v[«]|2  =  £  | h(x0  -  nAx)\2. 

n~  0  n 

The  right-hand  side  of  (10)  is  the  sum  over  all  frequencies  of 
the  kernel  power  spectrum:  a  constant.  Thus,  the  sum  of 
squares  of  the  basis  functions  at  any  location  must  equal  the 
same  constant: 

£  \h(x  -  «AX)|2  =  c,  for  all  x.  □  (11) 

n 

We  emphasize  that  flat  basis  power  of  the  transform  is  a 
necessary  but  not  a  sufficient  condition  for  shiftability.  For 
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example,  consider  convolution  with  the  interval  integration  domain  function  V(ux,  uy)  may  be  written  separably  in 
kernel,  polar  coordinates  as 


( 0,  otherwise, 

followed  by  sampling  at  unit  intervals.  This  system  will 
clearly  have  flat  basis  function  power,  since  the  power  of 
each  basis  function  is  one  over  its  interval,  and  the  functions 
do  not  overlap.  But  this  set  is  not  shiftable,  since  the  kernel 
spectrum  is  infinite  in  extent  and.  therefore,  the  sampling  of 
the  basis  set  introduces  aliasing. 

C.  Shiftability  in  Orientation 

We  can  apply  the  concept  of  shiftability  to  the  continuous 
shifting  of  orientation.  Given  a  set  of  basis  functions  centered 
at  the  same  spatial  location  and  scale,  and  tuned  to  a  finite 
number  of  orientations,  we  wish  to  interpolate  measurements 
at  all  orientations.  This  problem  has  been  studied  in  detail  by 
Freeman  and  Adelson  [15],  who  described  a  general  frame¬ 
work  for  continuously  rotating  filters.  They  discuss  the  num¬ 
ber  of  rotated  basis  functions  required  to  “steer”  a  given 
filter  (i.e.,  the  number  of  samples  in  the  orientation  domain), 
and  derive  the  interpolation  formula  given  in  (5). 

To  illustrate  this  idea,  consider  the  set  of  directional 
derivatives  of  a  two-dimensional  radially  symmetric  function, 
c(r),  where  r  =  \f  x2  +  y2  is  the  radial  coordinate.  The 
directional  derivative  of  this  function  at  orientation  60  is 
written  as 


D«\cir)\  =  cos(6  ~  eo)c’(r). 


where  6  is  the  angular  coordinate  and  c'(r)  =  (d  /  dr)[c(r)], 
the  derivative  of  c(r)  with  respect  to  its  argument.  Note  that 
this  function  is  polar  separable,  a  product  of  angular  and 
radial  component  functions.  Using  standard  trigonometric 
relations,  we  can  rewrite  this  expression  in  the  form  of  the 
shiftability  constraint  in  (6): 


cos  (6  -  80)c’{r) 


cos(0o)  cos  ( 6)c'(r ) 

+  sin(0o)  sin  (O)c'(r) 
cos(0o)[cos(0)c'(r)] 

+  sin(0o)[cos  |  6  - 


Thus,  the  rotation  of  the  directional  derivative  to  an  arbitrary 
angle,  6n,  may  be  computed  as  a  linear  combination  of  the 
directional  derivative  basis  functions  at  angles  0  and  ir  /2  (in 
brackets).  The  basis  set  composed  of  these  two  derivatives  is 
shiftable  in  orientation. 

We  can  also  consider  steerability  in  the  Fourier  domain: 
That  is,  the  steerability  of  the  Fourier  transforms  of  the 
basis  functions.  The  steerability  condition  is  easiest  to  state 
for  polar-separable  functions.  Assume  that  the  frequency 


V(r,6)  =  H(8)U(r), 

where  r  and  8  are  the  polar  coordinates  of  the  frequency 
domain.  H(8)  is  clearly  periodic  (with  period  2rr),  and 
shifting  this  function  with  respect  to  its  argument  corresponds 
to  rotating  the  two-dimensional  basis  function  V(aix,  a>y). 
Thus,  shiftability  of  H(6)  corresponds  to  “steerability”  of 
V(ux,  co  ).  The  shiftability  constraint  is  now  written  in  terms 
of  the  Fourier  components  of  the  function  H(9).  Assuming 
that  the  transform  basis  set  is  composed  of  rotations  of  this 
function  at  a  sufficient  number  of  angles,  the  interpolation 
(steering)  functions  may  be  derived  from  (5). 

In  general,  we  may  also  consider  orientation  shiftability  of 
functions  that  are  not  polar-separable.  We  write  the  function 
in  terms  of  a  family  of  angular  functions,  Hr(6 ): 

V(r,6)  =  Hr(6)U{r). 

Recall  that  the  shiftability  constraint  in  equation  (5)  depends 
only  on  which  frequency  components  of  the  function  are 
nonzero;  it  does  not  depend  on  the  magnitude  of  those 
components.  Therefore,  we  can  express  the  shiftability  con¬ 
straint  in  terms  of  the  union  of  the  nonzero  frequency  compo¬ 
nents  of  Hr(6)  for  all  r. 

Analogous  to  the  case  of  position  shiftability,  orientation 
shiftability  ensures  that  the  sum  of  the  squares  of  the  trans¬ 
form  coefficients  at  a  given  location  and  scale  will  be  invari¬ 
ant  under  changes  in  the  input  image  orientation.  Consider, 
for  example,  a  sinusoidal  input  of  spatial  frequency  u0,  with 
unit  amplitude.  If  we  rotate  the  input  about  a  point  (i.e., 
modify  the  angular  coordinate  of  to,.).  the  response  power 
(sum  of  squares  of  the  coefficients)  at  that  point  will  not 
change. 

D.  Shiftability  in  Scale 

Application  of  the  shiftability  concept  to  scale  is  more 
difficult  both  because  of  the  symmetry  constraints,  and  be¬ 
cause  we  want  to  express  the  parameter  on  a  logarithmic 
axis.  For  simplicity,  we  consider  functions  that  are  purely 
symmetric  or  anti-symmetric  about  the  origin  in  the  fre¬ 
quency  domain.  We  handle  this  symmetry  constraint  by 
constructing  a  reflection-shiftable  function. 

Proposition  3:  Consider  a  shiftable  representation  based 
on  function  H(r ),  with  interpolation  functions  bn(r0).  Then 
we  can  compute  the  function  U±(r,  ra)  =  H(r  —  r(>)  ± 
Hi-  r  -  rn)  from  the  set  of  functions  U±(r,nAr)  = 
H(r  -  nAr)  +  Hi-  r  -  nAr ),  n  e  {0  •  •  •  N  -  1},  using 
the  same  interpolation  functions.  We  say  that  U+(r,r0)  is 
reflection-shiftable. 

Proof:  The  proof  is  simple.  We  assume  H(r)  is 
shiftable: 


H(r  -  ro)  =  £  bn{r0)H{r  -  nAr). 

n 
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Then, 

U±{r,  r0)  =  H(r  -  r0)  ±H(-r-  ra) 

=  E  b„(r0)H(r  -  nAr) 

n 

±  E  bn{r0)H(-r  -  nAr) 

n 

=  E  b„(rB)[H{r  -  nAr )  ±  H{-r  -  /tAr)] 

n 

=  Hb„(r0)U±(r,nAr). 

n 

This  is  the  desired  result.3  □ 

Typically,  signals  of  interest  will  be  bandlimited,  and, 
thus,  we  can  limit  our  domain  to  a  particular  range:  re 
[-a)max,  “maxi-  Since  we  have  developed  our  discussion  of 
shiftability  in  the  context  of  periodic  signals,  we  can  make 
the  domain  periodic  by  identifying  the  endpoints.  We  will  use 
this  approach  in  designing  a  scalable  transform  in  Section  IV. 

The  only  remaining  problem  is  that  the  reflection-shiftable 
kernels  previously  described  correspond  to  equal-width  fre¬ 
quency  subbands.  Consider  a  set  of  basis  functions  corre¬ 
sponding  to  dilations  of  a  common  kernel.  The  fourier  trans¬ 
forms  of  these  basis  functions  will  be  shifted  copies  of  the 
kernel  transform  in  log  frequency.  Thus,  we  would  like  to 
have  frequency  subbands  that  are  of  equal  size  on  a  logarith¬ 
mic  axis.  To  accomplish  this,  we  note  that  the  shiftability 
property  is  not  affected  by  “warping”  the  domain.  For 
example,  if  we  assume  that 

U(r,r0)  =  Y.bn(r0)U(r,nAr), 

n 

then  clearly 

u(p(r)>  ro)  =  E  b„(r0)U(p(r),  nAr), 

n 

where  p(r)  is  any  warping  function.  Note  that  this  warping 
operation  will  not  affect  the  flat  power  property  (i.e. ,  Corol¬ 
lary  1)  of  the  basis  functions. 

Since  we  are  interested  in  functions  that  are  tuned  for 
scale,  the  p(r)  =  log(r)  is  the  correct  warping  function.  In 
this  case,  the  singularity  at  the  origin  presents  a  problem  and 
so  it  is  necessary  to  modify  the  function  near  the  origin.  This 
does  not  affect  the  shiftability  of  the  basis  functions;  it  does, 
however,  mean  that  they  are  not  precisely  related  by  dilation 
operations.  An  example  (modified)  log  function  is  illustrated 
in  Fig.  4.  In  Section  IV,  we  develop  a  one-dimensional 
“scalable”  (i.e,,  shiftable  in  scale)  transform  using  a  loga¬ 
rithmic  warping  function. 

E.  Joint  Shiftability 

In  the  previous  sections,  we  considered  shiftability  of 
several  transform  parameters  independently.  It  is  natural  to 
ask  whether  a  transform  can  be  simultaneously  shiftable  in 

3  Note  that  the  flat  basis  power  condition  described  in  Corollary  1  will 
only  be  met  if  we  include  both  the  symmetric  and  the  anti-symmetric  basis 
functions  when  computing  the  power.  That  is,  the  sum  £„[t/+(r,  nAr)2  + 
U_(r,  nAr)2]  -  2T„[H(r  -  nAr )2  +  H(-r  -  nA,)2]  will  be  constant 
if  H(r)  is  shiftable. 


Fig.  4.  Illustration  of  the  use  of  a  warping  function.  The  function  is  a 
modified  logarithm:  for  argument  values  near  the  origin  it  is  Linear.  Along 
the  top  are  the  set  of  warped  subbands.  Kernels  corresponding  to  these 
subbands  are  only  approximately  related  by  dilations:  modification  near  the 
origin  prevents  them  from  being  exact.  Along  the  left  is  a  shiftable  set  of 
functions.  The  same  interpolation  functions  for  the  shiftable  set  may  be  used 
to  shift  the  wapped  functions  along  the  warped  frequency  axis. 

several  parameters.  By  this,  we  mean  the  representation  is 
shiftable  in  each  parameter  while  all  other  parameters  are 
held  fixed.  For  different  applications,  it  may  be  desirable  to 
design  transforms  that  are  shiftable  in  different  subsets  of 
parameters.  We  discuss  some  of  the  basic  issues  in  joint 
shiftability;  a  full  analysis  of  the  topic  is  beyond  the  scope  of 
this  paper. 

We  first  note  that  for  parameters  that  are  independent,  joint 
shiftability  can  be  achieved.  For  example,  one  can  create  a 
two-dimensional  transform  that  is  shiftable  in  both  x  and  y 
position.  These  two  parameters  may  be  treated  indepen¬ 
dently,  and  the  shifting  may  be  performed  separably.  As  an 
example  of  this,  in  Section  V  we  discuss  the  implementation 
of  a  transform  that  is  jointly  shiftable  in  two-dimensional 
position  and  orientation. 

If,  however,  we  consider  parameters  that  are  Fourier 
complements  of  each  other,  there  are  difficulties.  For  exam¬ 
ple,  consider  a  one-dimensional  transform  with  basis  func¬ 
tions  parameterized  for  position  and  scale.  Position  shiftabil¬ 
ity  (with  spatial  subsampling)  requires  the  basis  functions  to 
have  limited  bandwidth  (regions  of  support)  in  the  Fourier 
domain.  This  would  imply  that  the  basis  functions  had  infinite 
spatial  regions  of  support  in  the  spatial  domain.  On  the  other 
hand,  scale  shiftability  (with  scale  subsampling)  requires 
them  to  have  compact  regions  of  support  in  the  spatial 
domain.  Thus  it  is  impossible  to  design  a  transform  that  is 
subsampled  in  two  Fourier  complementary  domains,  and 
shiftable  in  both  of  those  domains. 

It  should  now  be  clear  that  the  critical  sampling  of  orthog¬ 
onal  wavelet  transforms  prevents  them  from  being  shiftable 
in  both  position  and  scale.  If  we  use  an  ideal  (sine)  kernel  in 
a  dyadic  wavelet  transform,  the  subbands  will  be  spatially 
shiftable.  But  they  will  be  infinite  in  spatial  extent  and  highly 
non-shiftable  in  scale. 

There  are  several  possible  solutions  to  this  dilemma.  The 
simplest  is  to  maintain  full  resolution  in  one  of  the  parame¬ 
ters.  In  this  case,  there  are  no  compactness  restrictions  on  the 
complementary  parameter.  For  subband  transforms,  this  in¬ 
efficiency  may  be  unacceptable  for  many  applications.  Note, 
however,  that  the  Fourier  basis  itself  may  be  considered  to  be 
jointly  shiftable  in  this  sense.  For  each  frequency  c c  there  are 
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two  spatially  translated  basis  functions  (sin  (ux)  and 
cos  (a>*)).  Each  of  these  pairs  is  shiftable  (one  can  compute  a 
sinusoid  of  this  frequency  at  any  phase  from  these  two  basis 
functions).  The  transform  is  maintained  at  full  density  in  the 
frequency  domain  (by  definition),  and  so  is  trivially  shiftable 
there. 

This  example  also  suggests  a  connection  between  the 
shiftability  concept  and  eigensystem  analysis.  The  Fourier 
basis  functions  are  the  eigenfunctions  of  the  translation  opera¬ 
tor,  when  the  basis  functions  are  considered  as  complex 
pairs.  This  means  that  the  two-dimensional  subspace  spanned 
by  the  two  basis  functions  of  a  given  frequency  is  invariant 
under  translations.  Analogously,  a  set  of  basis  functions 
shiftable  in  a  parameter  forms  a  multidimensional  eigenfunc¬ 
tion  for  translations  with  respect  to  that  parameter.  The  space 
spanned  by  these  basis  functions  is  invariant  to  modifications 
of  the  parameter. 

Returning  to  the  issue  of  joint  shiftability,  one  can  also 
consider  nonseparable  shiftability,  in  which  the  interpolation 
of  a  point  in  the  parameter  space  is  based  on  a  set  of  sample 
points  not  restricted  to  variations  of  a  single  parameter.  In 
this  case,  the  interpolation  functions  are  multi-dimensional. 
This  is  true,  for  example,  in  a  wavelet  transform:  One  can 
interpolate  any  point  in  the  scale-position  parameter  space 
from  a  set  of  surrounding  scale  and  position  samples.  This 
was  done  in  a  recent  paper  by  Gopinath  and  Burrus  [18].  In 
the  present  paper,  however,  we  have  emphasized  the  advan¬ 
tages  of  being  able  to  shift  independently,  so  that  one  param¬ 
eter  may  be  shifted  while  holding  all  others  fixed.  This 
property  is  important  in  many  signal  analysis  applications,  as 
we  discuss  in  later  sections. 

Another  alternative  is  to  design  an  approximately  shiftable 
representation.  This  would  require  introduction  of  a  measure 
of  joint  nonshiftability  (joint  aliasing);  basis  functions  could 
then  be  designed  to  minimize  this  measure,  given  a  fixed 
sampling  structure.  Note  that  such  a  measure  would  necessar¬ 
ily  be  a  measure  of  joint  localization,  as  in  Gabor’s  work. 
But  since  it  would  be  based  on  region  of  support,  it  would 
lead  to  different  sets  of  “optimal”  basis  functions.  Note  also 
that  this  measure  would  be  dependent  on  the  sampling  density 
chosen:  the  joint  shiftability  constraint  becomes  less  restric¬ 
tive  as  one  increases  the  number  of  transform  samples. 

Finally,  given  that  joint  shiftability  is  impossible  to  achieve 
in  a  pair  Fourier  complementary  domains,  we  can  also 
consider  relaxing  the  shiftability  constraint  in  one  of  the 
domains.  In  particular,  we  can  replace  the  shiftability  con¬ 
straint  in  one  domain  by  the  less  restrictive  constraint  that  the 
basis  function  power  is  flat  (see  Corollary  1).  In  the  next 
section,  we  show  that  this  will  result  in  a  self-invertible 
transform. 

F.  Self-Invertibility 

Orthonormal  transforms  are  self-inverting,  but  self-invert¬ 
ing  transforms  need  not  be  orthonormal.  As  we  stated  in 
Section  II  orthonormality  is  the  combination  of  self-inversion 
and  linear  independence.  One  can  construct  transforms  which 
are  self-inverting,  yet  overcomplete.  Daubechies  has  termed 
these  “tight  frames”  [8], 
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A  trivial  example  serves  to  demonstrate  that  such  trans¬ 
forms  exist.  A  nonorthogonal,  self-inverting  transform  may 
be  formed  from  any  orthogonal  transform  by  duplicating 
every  projection  function  and  dividing  the  kernel  weights  by 
a  factor  of  \/2  .  The  result  is  an  over-complete  representation 
in  which  each  pair  of  identical  projection  functions  encodes 
the  same  information.  But  the  transform  remains  self-invert¬ 
ing:  the  basis  and  projection  functions  are  identical.  A  less 
trivial  example  of  an  overcomplete  self-in  verting  transform 
based  on  an  overcomplete  Gabor  set  is  described  by  Simon- 
celli  [39],  The  “cortex”  transform  developed  by  Watson 
[47]  is  another  example  of  an  overcomplete  self-inverting 
transform.  The  overcomplete  steerable  pyramid  described  in 
Section  V  is  also  a  self-inverting  transform. 

Intuitively,  the  importance  of  self-invertibility  is  that  it 
ascribes  a  natural  meaning  to  the  transform  coefficients.  Each 
coefficient  is  computed  using  a  projection  function  with  a 
particular  position  and  shape  in  both  the  spatial  and  fre¬ 
quency  domains.  The  basis  functions  used  to  invert  the 
transform  are  the  same.  This  can  be  important  for  many 
applications,  such  as  data  compression  and  image  enhance¬ 
ment.  Without  this  property,  errors  introduced  by  nonlinear 
processing  of  the  coefficients  will  spread  to  locations  and 
frequencies  other  than  those  that  were  used  to  compute  the 
coefficients. 

In  previous  sections,  we  have  concentrated  on  one  trans¬ 
form  parameter  at  a  time.  Now  we  must  consider  the  trans¬ 
form  as  a  whole.  We  restrict  our  attention  to  the  class  of 
transforms  that  may  be  described  by  analysis/synthesis  filter 
banks,  as  illustrated  in  Fig.  5.  In  fact,  this  structure  is 
general  enough  to  represent  any  linear  transform,  but  we  will 
have  in  mind  a  subband  transform  such  as  the  wavelet 
transform. 

As  before,  we  will  assume  that  the  input  signal,  /(*),  is 
periodic  with  period  2ir.  In  addition,  we  will  need  to  assume 
that  the  input  signal  is  band-limited  to  a  finite  range  of 
frequencies,  /re  [-Xrmax,  ^rmax].  In  the  system  depicted  in 
Fig.  5,  the  input  signal  is  convolved  with  a  set  of  M  kernels, 
hm{x).  The  index,  m,  corresponds  to  the  frequency  subband 
of  the  kernel.  The  output  of  each  convolution  is  sampled  at 
Nm  points  at  intervals  of  2  tt  j  Nm .  This  produces  a  set  of 
transform  coefficient  sequences,  ym[n]. 

In  the  synthesis  section,  the  coefficients  are  reconstituted  as 
a  continuous  signal  by  attaching  them  to  Dirac  delta  functions 
at  the  proper  sample  locations.  That  is,  from  the  discrete 
signal  ym[n\,  we  form  a  continuous  signal  ym( x)  = 
Tm[«]  ‘  o(x  -  nAx).  These  signals  are  then  con¬ 
volved  with  the  space-reversed  kernels,  gm(  —  x).  The  nega¬ 
tion  of  the  argument  allows  us  to  be  consistent  about  the 
definitions  of  basis  and  projection  functions.  Given  this  struc¬ 
ture,  we  can  show  the  following. 

Proposition  4:  A  subband  transform  that  is  spatially 
shiftable  in  each  subband  and  exhibits  flat  basis  function 
power  in  the  complementary  Fourier  domain  over  all  fre¬ 
quencies  of  interest  is  self-inverting. 

Proof:  The  proof  is  by  construction  in  one  dimension; 
the  extension  to  higher  dimensions  is  straightforward.  We 
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Fig.  5.  Analysis/synthesis  filter  bank. 


assume  that  the  conditions  of  the  proposition  hold,  and  that 
the  projection  filters  are  equal  to  the  basis  filters:  gm(x)  = 
hm(x).  Then  we  must  show  that  the  system  in  Fig.  5  is  an 
identity  system. 

Using  (8),  the  output  of  the  entire  system  may  be  written 
in  the  frequency  domain 

Hk) 

M-  1 

=  E 

m  =  0 
M  -  1 

=  E  Hm(k)Hj-k)  -  F(k) 

m  =  0 

M  -  1  Nm-  1 

+  E  E  Hm{k  +  lNm)H,(~k)F(k  +  lNm). 

m  =  0  /=1 

(12) 


Nm  —  1 

E  Hn 

1=0 


(k  +  lNm)  ■  F(k  +  lNm) 


The  second  sum  in  this  equation  is  the  aliasing  term.  Since 
we  are  assuming  that  the  system  is  shiftable  (aliasing-free)  in 
the  spatial  domain,  the  second  sum  evaluates  to  zero.  The 
remaining  (shift-invariant)  system  response  is  then 


Hk)  = 


M-  l 

E  \Hm{k) 

m  =  0 


■F(k). 


Now,  if  the  set  of  Hm(k)  are  designed  to  have  flat  power  as 
in  (11),  then 


M-  1 

E  \Hm{k)\  =  c,  for  all  k. 

m  =  0 


If  we  impose  the  additional  requirement  that  the  value  of  the 
constant  c  is  unity  (this  is  just  an  adjustment  of  the  normal¬ 
ization  of  the  basis  functions),  then  the  overall  system  re¬ 
sponse  will  be  unity,  and  the  representation  is  self-inverting. 

□ 


We  will  use  this  result  in  Section  V  to  construct  a  self- 
inverting  pyramid  transform  that  is  shiftable  in  space  and 
orientation. 


IV.  A  One-Dimensional  Example 

In  this  section,  we  describe  a  one-dimensional  transform 
that  is  approximately  shiftable  in  scale  and  position  and  we 
use  it  in  a  simple  example  application.  We  designed  a 
shiftable  function,  H(r),  by  fitting  an  inverted  parabola  with 
one  period  of  a  five-term  Fourier  series.  We  then  constructed 
a  set  of  six  (frequency  domain)  basis  functions,  U(r ,  «ir/3). 


Fig.  6.  Frequency  response  of  two  of  the  scale  basis  functions  (inner  and 
outer  curves),  along  with  the  frequency  response  of  an  interpolated  basis 
function. 


using  the  techniques  described  in  the  previous  section.  In 
particular,  we  warped  the  frequency  axis  using  a  modified 
log  function: 

p(r)  =  sgn  ( r )  •  log  (a]  r\  +  1), 

where  a  is  a  normalization  factor  a  =  (eT  —  l)/ir.  This 
modification  to  the  log  function  allows  us  to  handle  the 
singularity  at  the  origin.  It  does  not  effect  the  shiftability  of 
the  basis  functions,  although  the  basis  functions  will  no 
longer  be  exact  dilations  of  each  other. 

To  form  a  basis  set,  we  combined  the  warped  functions  in 
a  reflection-symmetric  manner: 


ne{- 2, 


nir  \ 

T )’ 

1,0, 1,2}. 


Two  of  the  basis  functions,  whose  Fourier  transforms  are 
tuned  for  different  frequency  octaves,  are  illustrated  in  Fig. 
6.  We  also  show  a  basis  function  that  has  been  interpolated 
(shifted)  to  an  intermediate  scale.  The  kernels  used  for 
convolution  in  the  spatial  domain  are  computed  as  the  Fourier 
transforms  of  the  U(r,  mr  / 3). 


A.  Scale-Space  Decomposition 

To  illustrate  the  use  of  this  set  of  scalable  filters  discussed 
in  the  previous  section,  we  show  an  efficient  calculation  of  a 
“scale-space”  image  [48],  Ordinarily,  this  calculation  would 
require  convolution  of  the  signal  with  a  large  battery  of 
filters,  each  tuned  for  a  slightly  different  scale.  The  property 
of  scale  shiftability  allows  us  to  efficiently  compute  the  signal 
at  all  scales,  using  only  a  small  set  of  filters. 

We  first  convolve  a  one-dimensional  signal  with  each  of 
our  scale  filters,  producing  a  set  of  six  one-dimensional 
signals  y„(x),  n  e  {0,1,  •  •  • ,  5}.  As  mentioned  in  the  previ¬ 
ous  section,  we  must  retain  the  full  sampling  density  of  the 
convolution  operations  in  order  for  the  subbands  to  be  spa¬ 
tially  shiftable.  We  computed  a  set  of  interpolation  functions 
using  (5),  and  interpolated  a  set  of  one-dimensional  signals  at 
intermediate  scales  by  taking  a  weighted  combination  of  the 
y„(x). 

The  results  for  a  fractal  signal  are  shown  in  Fig.  7.  We 
show  the  original  signal  (a)  and  four  of  the  six  “basis” 
signals  computed  by  convolving  with  each  of  the  basis  filters 
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Fig.  7.  Example  scale-space  decomposition,  (a)  Original  signal.  Brownian  fractal,  (b)-(e)  Four  of  the  six  “scale  signals" 
resulting  from  applying  each  of  the  scale  basis  filters  to  the  original  fractal  signal,  (f)  Zero-crossings  of  the  interpolated 
“scale-space”  image  (see  text). 


(b)-(e).  From  the  set  of  basis  signals,  we  can  interpolate  a 
scale  signal  at  any  intermediate  scale.  To  illustrate  this,  we 
interpolated  scale  signals  at  128  different  scales  ranging  from 
that  of  the  lowest-frequency  basis  scale  to  the  highest- 
frequcncy  basis  scale.  These  were  pasted  together  as  the  scan 
lines  of  a  two-dimensional  “scale-space”  image,  in  which 
the  horizontal  axis  corresponds  to  spatial  position,  and  the 
vertical  axis  corresponds  to  scale.  We  then  applied  a  simple 
zero-crossing  detector  to  this  image.  Fig.  7(f)  shows  the 
zero-crossings  of  the  scale-space  function.  Note  that  the 


horizontal  density  of  zero-crossings  decreases  as  the  scale 
becomes  coarser. 

V.  Two-Dimf;nsional  Example:  A  Steerable  Pyramid 

We  have  designed  and  implemented  a  two-dimensional 
transform  that  is  jointly  shiftable  in  orientation  and  position 
(see  also  [15]).  The  basis  functions  are  translations,  dilations, 
and  rotations  of  a  single  kernel,  and  the  transform  is  con¬ 
structed  as  a  recursive  pyramid.  Fig.  8  contains  an  illustra¬ 
tion  of  the  frequency  domain  decomposition  performed  by 
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Fig.  8.  Illustration  of  the  spectral  decomposition  performed  by  the  steer¬ 
able  pyramid.  Basis  functions  are  related  by  dilations  and  rotations  (except 
for  the  inner  lowpass  subband,  and  the  outer  residual  subband). 

the  transform.  Shiftability  in  scale  was  not  needed  for  our 
intended  applications,  and  is  not  a  property  of  this  transform. 
Our  transform  does,  however,  have  flat  basis  power  in  the 
scale  domain,  thus,  it  is  self-inverting  (see  Corollary  1). 
Because  it  is  shiftable  in  orientation,  we  call  it  a  “steerable 
pyramid.” 

Fig.  9(a)  illustrates  the  orientation-shiftable  basis  filters  at 
one  scale  of  the  pyramid.  The  design  of  the  filters  is  de¬ 
scribed  in  the  next  section.  The  rest  of  the  figure  shows  the 
decomposition  of  a  simple  test  image.  Unlike  the  orthogonal 
pyramids  based  on  QMF  filter  banks,  the  steerable  pyramid 
is  significantly  overcomplete:  there  are  16/3  times  as  many 
coefficients  in  the  representation  as  in  the  original  image.  The 
overcompleteness  limits  the  computational  efficiency  but  in¬ 
creases  its  usefulness  for  many  image  processing  and  analysis 
tasks,  as  we  will  show  in  Section  VI. 

A.  Pyramid  (Radial  Frequency)  Implementation 

We  used  a  polar-separable  frequency  domain  design  strat¬ 
egy  for  the  filters  of  the  transform.  In  this  section,  we 
described  the  radial  frequency  (scale)  portion  of  the  design. 

Pyramid  algorithms  are  based  on  recursive  application  of 
filtering  and  subsampling  operations.  Typically,  the  input 
signal  is  partitioned  into  low-  and  high-pass  portions,  the 
low-pass  portion  is  subsampled,  and  the  subdivision  is  re¬ 
peated  recursively.  The  high-pass  portion  may  be  subsampled 
as  in  wavelet  pyramids,  or  left  at  full  density  as  in  the 
Laplacian  pyramid  [5].  A  single  stage  of  the  transform  may 
be  written  in  the  form  of  a  standard  analysis/synthesis  filter 
bank  [41]  (although  the  Laplacian  pyramid  is  typically  not 
implemented  using  this  architecture).  The  filters  are  chosen 
such  that  this  single-stage  system  behaves  like  an  identity 
operation.  The  pyramid  structure  is  achieved  by  applying  this 
single-stage  transform  recursively  to  the  low-pass  subband  of 
the  previous  single-stage  transform. 

In  the  present  case,  we  wish  to  subdivide  the  signal  into 
low-pass  and  bandpass  portions.  To  achieve  this  decomposi¬ 
tion,  we  have  implemented  a  novel  pyramid  architecture,  in 
which  the  response  of  the  single-stage  system  (and  therefore 
the  entire  pyramid)  is  low-pass.  The  block  diagram  defining 
the  pyramid  recursion  is  shown  in  Fig.  10.  The  input  signal 
is  convolved  with  a  bandpass  kernel,  B(u),  and  a  low-pass 
kernel  L,(u)).  To  ensure  that  there  is  no  aliasing  in  the 
bandpass  portion,  it  is  not  subsampled.  The  low-pass  portion 


is  subsampled  by  a  factor  of  two,  and  then  convolved  with 
another  low-pass  kernel,  L0( co). 

Using  standard  signal  processing  results,  and  assuming  that 
the  subsampling  of  the  low-pass  branch  introduces  negligible 
aliasing,  we  write  the  response  of  the  system  of  Fig.  10  as 

S(«)  =  I  -B(co)  1 2  +  |L1(u)!2|L0(2u)|2  (13) 

Since  B( a>)  is  bandpass,  and  the  /,,(o>)  are  low-pass,  S(oi) 
has  a  low-pass  characteristic.  Therefore,  a  high-pass  residue 
band  must  be  computed  by  convolving  with  the  filter 

R(u)  =  1  -  S(w). 

Alternatively,  the  original  image  can  be  initially  upsampled 
to  eliminate  all  frequency  components  that  would  be  passed 
by  R(a i). 

In  order  to  cascade  the  system  recursively,  we  must  be 
able  to  replace  some  portion  of  the  diagram  with  the  entire 
system.  We  therefore  require  that  S(oj)  =  |£0(u))|2,  thus 
allowing  the  diagram  to  be  recursively  cascaded  as  illustrated 
in  Fig.  11.  The  resulting  constraint  on  the  bandpass  filter 
B(  oj)  is 

I  L0(u)  |  2  =  |  B(u)  |  2  +  |  T-i(w)  1 2  I  Lo(2“)  |  2  ■  (14) 

This  recursion  contraint  is  used  in  the  design  of  the  radial 
filters. 

One  other  constraint  on  the  radial  filter  design  is  that  the 
subsampling  operation  should  not  introduce  significant  alias¬ 
ing  in  the  low-pass  branch.  It  would  seem  that  this  constrains 
the  low-pass  filter  Z,,(ai)  to  have  strictly  zero  response  above 
w  =  tt/2.  In  practice,  the  restriction  is  less  severe.  The 
lowpass  filter  L0(u >)  that  follows  the  subsampling  operation 
removes  most  aliased  components,  which  are  high  frequency 
in  the  subsampled  domain.  Therefore,  we  used  a  seven-tap 
binomial  low-pass  filter  that  is  fairly  gentle  in  the  frequency 
domain:  /,[«]  =  1  /64  ■  [1,6,  15,  20,  15,  6,  1], 

We  also  have  freedom  to  choose  the  L0(u)  filter,  or 
equivalently,  the  system  response  S(oj).  Since  it  represents 
the  lowpass  response  of  the  overall  system,  we  require  that  it 
be  unity  from  0  to  ir  /2  radians,  and  zero  at  <o  =  tt.  Using 
the  Parks -McClellan  algorithm,  we  constructed  the  13-tap 
filter  that  best  meets  these  criteria. 

Having  specified  the  two  lowpass  filters,  the  bandpass  filter 
is  constrained  by  the  recursion  relation  given  in  (14).  We 
designed  a  symmetric  15-tap  bandpass  filter  that  minimizes 
the  maximum  error  amplitude.  We  used  a  simplex  algorithm 
[38]  to  search  the  eight-dimensional  space  of  free  parameters. 
The  result  is  the  bandpass  filter  response  shown  in  Fig.  13(a). 
The  maximum  power  deviation  from  the  desired  frequency 
response  is  roughly  3.5  percent. 

B.  Angular  Frequency  Component  Design 

We  chose  an  angular  frequency  response  H(8)  = 
j cos3(0).  This  can  be  expressed  in  terms  of  sinusoidal  har¬ 
monics  through  use  of  a  standard  trigonometric  identity: 

1  3 

cos3(0)  =  —  cos  (39)  +  —  cos  (0) . 
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Fig.  9.  Illustration  of  the  structure  of  the  steerable  pyramid,  (a)  Filter  kernels  used  to  construct  the  pyramid.  These  four  filters, 
which  are  rotated  versions  of  a  common  filter,  span  the  space  of  all  rotations  of  that  common  filter,  (b)  Test  image,  (c)  Low-pass 
coefficient  image  in  the  pyramid  representation  of  the  test  image,  (d)-(f)  Bandpass  coefficient  images  in  pyramid  representation. 


steerable,  self-inverting  pyramid.  The  impulse  response  of  the  entire  system 

(shown  at  the  top)  equals  the  power  spectrum  of  the  filter  L0(o>).  This  allows  Fig.  11.  Illustration  of  a  two-stage  recursive  cascade  of  the  system  shown 
us  to  cascade  the  system  recursively,  as  illustrated  in  Fig.  11.  to  create  a  in  Fig.  10. 

multiscale  transform. 


Then  the  number  of  angular  basis  functions  required  for 
shiftability  and  the  interpolation  functions  are  determined  by 
(5).  Solving  for  this  case  gives  the  exact  interpolation  func¬ 
tions, 

b„{8)  =  —  [2 cos  (6  -  nAt)  +  2 cos  (3(0  -  «Ae))]  (15) 

where  Ae  =  7r/4,  n  e  {0,  1, 2,  3} . 

One-dimensional  plots  of  the  angular  Fourier  components 
of  the  four  angular  basis  functions,  and  their  power  spectra, 
are  shown  in  Fig.  12.  A  linear  combination  of  the  four  basis 
functions  in  Fig.  12(a)  can  synthesize  an  arbitrary  angular 
translation  of  the  cos3(0)  kernel. 


For  some  applications,  we  need  to  employ  quadrature  pairs 
of  filters  that  have  the  same  frequency  response  but  differ  in 
phase  by  ir/2  radians  (i.e. ,  pairs  of  filters  that  are  Flilbert 
transforms  of  each  other  [4]).  Since  the  cos3(0)  frequency 
response  is  odd-symmetric,  its  Hilbert  transform  is 
|  cos3(0)  | .  To  make  a  steerable  set  of  the  quadrature  comple¬ 
ments  of  the  bandpass  filters,  we  use  the  first  three  terms  in  a 
Fourier  expansion  of  the  Hilbert  transform: 

|  cos3 (0 )  [  =  0.4244  +  0.5093  cos  (20)  +  0.0727 cos  (40) . 

(16) 

This  angular  function  requires  five  basis  functions  for  shifta¬ 
bility.  We  note  that  since  the  power  spectra  of  both  filters  of 
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Fig.  12.  (a)  Set  of  four  basis  functions  of  the  form  cos3(0  -  n A„),  plotted 

from  0  to  2  tr.  This  set  of  basis  functions  is  sufficient  to  synthesize  cos  'iO  — 
0O),  for  any  0O.  (b)  Basis  function  power  as  a  function  of  angle. 


the  quadrature  pair  are  identical,  either  set  of  filters  can  be 
used  in  the  implementation  of  the  pyramid. 

C.  Two-dimensional  Filter  Design 

Using  the  frequency  transformation  method  [26],  we  con¬ 
verted  the  one-dimensional  radial  filters  into  two-dimensional 
filters.  Fig.  13(b)  shows  the  two-dimensional  bandpass  filter. 
The  frequency  transformation  method  produces  two-dimen¬ 
sional  filters  with  only  approximate  circular  symmetry,  but 
the  approximation  is  quite  accurate  over  the  passband  of  our 
filters. 

To  construct  a  set  of  oriented  filters,  we  subdivide  the 
annular  bandpass  spectrum  into  orientation  subbands.  The 
angular  variation  is  sufficiently  slow  to  use  the  frequency 
sampling  method  [33],  We  computed  the  Fourier  transform 
of  the  bandpass  kernel,  multiplied  by  the  four  desired  angular 
responses,  cos3 (6  -  nAe),  and  computed  the  inverse  Fourier 
transform  to  obtain  the  basis  filter  impulse  responses.  Fig. 
13(c)— ff)  show  the  frequency  responses  of  the  bandpass 
filters. 

We  can  now  see  that  the  steerable  pyramid  transform  is 
(approximately)  self-inverting  over  the  lowpass  range  of  ra¬ 
dial  frequencies  u)r  e  [0,  7r  /2] .  When  applying  the  oriented 
filters  in  two  dimensions,  the  convolution  results  are  not 


spatially  subsampled.  These  subbands  are  therefore  spatially 
shiftable.  The  low -pass  radial  filters  were  designed  to  prevent 
aliasing,  and  so  the  subsampled  low-pass  signal  is  also  spa¬ 
tially  shiftable.  In  the  frequency  domain,  both  the  angular 
and  radial  component  designs  ensure  that  the  sum  of  squares 
of  the  basis  functions  is  constant  with  respect  to  orientation 
and  over  the  relavant  range  of  scales  (i.e. ,  over  the  passband 
of  L0(oj)).  Therefore,  by  Proposition  4,  the  pyramid  is  self- 
inverting. 


VI.  Image  Processing  Applications 

Since  the  steerable  pyramid  is  jointly  shiftable  in  orienta¬ 
tion  and  position,  one  can  perform  useful  image  analysis  and 
manipulation  directly  on  the  transform  representation.  In 
[15],  the  steerable  pyramid  representation  was  used  to  infer 
surface  shape  from  image  intensities  (known  as  the  “shape- 
from-shading"  problem).  Here,  we  demonstrate  its  use  in 
two  more  applications. 

A.  Stereo  Matching 

In  the  stereo  matching  problem,  the  visual  system  is  con¬ 
fronted  with  two  views  of  a  scene  from  differing  positions 
(left  and  right  eyes).  The  task  is  to  find  the  relative  horizontal 
displacement  between  corresponding  points  in  the  images. 
This  displacement  is  inversely  proportional  to  the  distance  to 
the  point  in  the  three-dimensional  world,  and  so  may  be  used 
to  recover  a  “depth  map"  for  the  image.  Because  matching 
is  simplest  when  there  are  small  displacements  between  the 
two  images,  it  is  common  to  apply  stereo  matching  algo¬ 
rithms  within  a  multiresolution  representation.  The  matching 
is  first  performed  at  coarse  spatial  scales,  where  all  displace¬ 
ments  are  small  relative  to  the  distance  between  pixels,  and 
the  results  are  then  used  as  initial  estimates  for  progressively 
finer  spatial  scales.  Thus,  at  each  stage  of  the  computation, 
the  matcher  operates  on  measurements  at  the  current  scale, 
and  estimates  from  the  previous  scales. 

Since  the  steerable  pyramid  is  a  multi-scale  representation, 
it  may  be  used  to  implement  a  coarse-to-fine  disparity  estima¬ 
tor.  More  importantly,  the  spatial  shiftability  of  the  subbands 
allows  accurate  displacement  estimation  directly  from  the 
subband  coefficients.  Without  the  spatial  shiftability  property, 
a  uniform  translation  of  both  images  would  transfer  coeffi¬ 
cient  energy  into  other  orientation  and  frequency  (scale) 
bands.  Thus  a  disparity  estimator  operating  on  a  subband 
would  be  affected  by  the  absolute  spatial  positions  of  the 
images. 

Fig.  14  shows  an  example.  Fig.  14(a)  shows  the  left  and 
right  eye  views  of  a  random  dot  stereogram  [21].  A  central 
rectangle  of  dots  is  displaced  horizontally  by  one  pixel  be¬ 
tween  the  left  and  right  eye  views.  Fig.  14(b)  shows  an 
overlaid  plot  of  cross-sections  of  the  same  horizontal  line  of 
the  two  images;  (c)  shows  a  surface  plot  of  the  depth  map 
corresponding  to  this  stereo  pair. 

We  used  this  test  stimulus  to  compare  the  performance  of 
the  steerable  pyramid  and  a  wavelet  image  representation  in 
stereo  matching.  We  compare  the  disparity  estimates  after 
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Fig.  13.  Frequency  domain  filter  response  plots,  illustrating  design  procedure  for  digital  steerable  filter,  (a)  Desired  radial 
frequency  distribution,  plotted  from  0  to  n.  (b)  Desired  angularly  symmetric  two-dimensonal  frequency  response,  obtained  through 
frequency  transformation.  The  profile  in  (b)  was  multiplied  by  the  desired  cos3(0  -  nA0)  angular  frequency  responses  and  inverse 
transformed  to  yield  the  steerable  basis  set.  (c)-(f)  Imaginary  component  of  the  frequency  responses  of  the  resulting  steerable 
filters. 


one  stage  in  a  coarse-to-fine  algorithm  (i.e.,  we  compute 
stereo  disparity  at  a  single  spatial  scale). 

We  use  a  least-squares  gradient-based  displacement  estima¬ 
tor  similar  to  that  used  by  Lucas  and  Kanade  [27].  This  is 
based  on  a  locally  planar  model  of  the  image.  For  each  patch 
P  of  the  image,  it  computes  the  image  displacement  that 


minimizes  the  squared-error  function: 


where  I,(x)  and  Ir(x)  are  the  left  and  right  images,  and  d  is 
the  disparity  at  the  center  of  patch  P.  Approximating  the 
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Fig.  14.  (a)  Left  and  right  eye  views  of  a  random  dot  stereogram.  The  two  images  are  identical,  except  that  a  central  rectangular 

portion  is  displaced  by  1  pixel  between  views,  (b)  Overlaid  plots  of  the  left  half  of  a  horizontal  scan  line  from  each  stereo  image,  (c) 
Surface  plot  of  the  corresponding  depth  map.  (d)  Reconstruction  of  one  subband  of  a  wavelet  representation  of  each  of  the  stereo 
images,  (e)  Overlaid  plots  of  the  left  half  of  a  horizontal  scan  line  from  each  wavelet  subband,  (f)  Surface  plot  of  the  depth  map 
recovered  by  applying  the  stereo  matching  algorithm  to  the  wavelet  subbands. 
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Fig.  14.  (g)  Reconstruction  of  one  subband  of  a  steerable  pyramid  representation  of  each  of  the  stereo  images,  (h)  Overlaid  plots 

of  the  left  half  of  a  horizontal  scan  line  from  each  pyramid  subband,  (i)  Surface  plot  of  the  depth  map  recovered  by  applying  the 
stereo  matching  algorithm  to  the  steerable  pyramid  subbands. 


displaced  images  with  a  first-order  Taylor  expansion  and 
solving  for  the  minimum-error  estimate  gives 


d  = 


£^d[/r(*) -/,(*)] 

xeP  uX 


£ 

xeP 


di(> 


dx 


where  I(x)  is  the  pointwise  average  of  the  two  images.  In 
our  example,  we  used  a  patch  size  of  nine  pixels,  weighted 
by  a  gaussian  profile. 

The  steerable  pyramid  coefficients  corresponding  to  a  sin¬ 
gle  orientation  and  scale,  are  shown  in  Fig.  14(g).  Fig.  14(h) 
shows  graphs  of  the  left  half  of  the  corresponding  horizontal 
cross-sections.  The  left  portions  of  the  cross-sections  are 
identical.  Since  the  subbands  are  not  aliased,  the  right  por¬ 
tions  of  the  two  cross-sections  are  approximately  translations 
of  each  other  (if  we  had  interpolated  them  using  the  correct 
interpolator,  they  would  be  exact  translations).  Use  of  the 
Lucas  and  Kanade  algorithm  on  these  subbands  produces  a 
good  estimate  of  disparity.  The  resulting  depth  map  is  shown 
in  Fig.  14(i). 

Fig.  14(d)  shows  the  Daubechies  four-tap  wavelet  repre¬ 
sentation  of  the  stereo  pair  reconstructed  from  a  single  sub¬ 
band  chosen  to  match  that  used  in  the  steerable  pyramid 
example.  Fig.  14(e)  shows  the  left  half  of  the  corresponding 
plots  of  horizontal  cross-sections.  Note  that  the  right  portions 
of  the  two  cross-sections  are  not  related  by  translation.  As  in 


the  example  of  Fig.  1,  translation  of  the  input  signal  causes 
an  exchange  of  power  amongst  the  subbands,  and  this  alias¬ 
ing  in  the  representation  produces  large  errors  in  the  disparity 
estimates.  The  resulting  depth  estimate  is  poor,  as  can  be 
seen  in  Fig.  14(f). 

We  emphasize  that  the  choice  of  a  different  displacement 
estimation  technique,  such  as  correlation,  will  not  solve  this 
problem:  the  aliasing  in  the  subbands  causes  an  irreversible 
loss  of  information.  One  could  apply  a  low-pass  filter  to  each 
subband  to  eliminate  those  frequencies  that  are  contaminated 
by  aliasing,  but  this  would  leave  gaps  in  the  spectrum  which 
would  be  ignored  by  the  estimator.  An  input  signal  with 
significant  spectral  content  in  these  gaps  (and  not  elsewhere) 
could  not  be  processed. 

B.  Image  Enhancement 

A  common  problem  in  image  processing  is  to  restore  an 
image  that  has  been  corrupted  by  noise.  Through  the  use  of 
an  image  transform,  one  can  decompose  the  image  into  a  set 
of  coefficients  that  allow  discrimination  between  image  infor¬ 
mation  and  noise.  These  coefficients  can  be  modified  accord¬ 
ing  to  the  likelihood  that  they  represent  signal  rather  than 
noise.  The  “cleaned”  image  can  then  be  constructed  by 
inverting  the  transform,  (e.g.,  [37],  [23],  [6]). 

Natural  images  tend  to  contain  locally  oriented  structures 
such  as  lines,  edges,  and  textures,  while  noise  tends  to  be 
isotropic  (i.e.,  without  preferred  orientation).  Therefore,  an 
image  decomposition  based  on  oriented  filters  should  help 
one  to  determine  whether  a  given  image  structure  is  due  to 
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Fig.  15.  Noise  removal  example.  Figures  on  the  right  are  enlarged  portions  of  those  on  the  left,  (a)  Original  noise-free  image,  (b) 
Image  corrupted  by  noise.  SNR  is  12.42  dB.  (c)  Results  of  image  restoration  using  steerable  pyramid.  SNR  is  23.0  dB.  (d)  Results 
of  image  restoration  using  a  Wiener  filter  (see  text).  SNR  is  19.24  dB. 
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image  or  noise,  and  should  allow  one  to  enhance  the  image 
quality  (cf.  [23]).  For  these  reasons,  the  steerable  pyramid 
representation  described  in  the  previous  section  is  well-suited 
for  image  enhancement.  Shiftability  in  orientation  and  space 
allow  all  orientations  and  positions  to  be  treated  uniformly. 
The  multi-scale  nature  of  the  representation  allows  the  pro¬ 
cessing  to  depend  on  the  spatial  frequency  band. 

We  want  to  preserve  the  contents  of  the  image  representa¬ 
tion  that  correspond  to  visually  significant  oriented  structures 
in  the  image.  These  structures  may  appear  at  any  relative 
phase:  even-phase  (lines),  odd-phase  (edges),  or  contours  of 
intermediate  phase.  A  local  “oriented  energy”  measure, 
formed  by  the  sum  of  squares  of  a  quadrature  pair  of  even- 
and  odd-phase  filters,  will  treat  structures  of  different  phases 
uniformly  (cf.  [36]).  Thus,  we  construct  two  steerable  pyra¬ 
mids,  one  using  odd-symmetric  filters,  and  the  other  using 
their  even-symmetric  Hilbert  transform  counterparts,  and  use 
the  sum  of  squares  of  the  coefficients  at  each  position, 
orientation  and  scale  to  measure  orientation  strength. 

We  use  this  oriented  energy  measure  to  determine  the 
appropriate  modification  of  the  coefficients  in  one  of  the 
pyramids  (although  we  use  both  of  the  quadrature  pair  of 
filters  to  compute  energy,  only  one  is  needed  to  reconstruct 
the  enhanced  image).  If  the  oriented  energy  is  large,  the 
pyramid  coefficient  at  that  orientation  is  left  unchanged; 
otherwise  it  is  be  attenuated.  To  perform  this  operation,  we 
use  a  soft  threshold  function  similar  to  that  described  in  [6]: 

c, 

(1  +  exp(-5(x/eT  -  r)))  ’  '  ^ 

where  c,  is  the  coefficient  at  a  particular  orientation,  e,  is  the 
energy  at  that  orientation,  and  S  and  T  are  sharpness  and 
threshold  parameters  that  are  chosen  for  each  subband. 

We  use  the  steerability  of  the  image  representation  to 
ensure  that  the  noise  processing  is  independent  of  image 
orientation.  At  each  scale  and  position,  we  analytically  find 
the  local  dominant  orientation  from  the  quadrature  pair  of 
steerable  filters  [15].  For  the  case  of  a  single  oriented  struc¬ 
ture  this  is  the  orientation  which  maximizes  the  energy 
measure  er  We  then  steer  each  of  the  odd  phase  steerable 
pyramid  filters  so  that  one  of  them  is  aligned  with  the 
dominant  orientation.  We  modify  the  steered  coefficients 
according  to  (17).  After  this  modification,  we  steer  the 
coefficients  back  to  their  original  orientations,  and  recon¬ 
struct  the  image  from  these  altered  coefficients.  Altering  the 
coefficients  in  a  coordinate  system  defined  by  the  local  domi¬ 
nant  orientation  ensures  that  the  image  will  be  processed  in  a 
rotation-invariant  manner. 

Fig.  15(a)  shows  the  original  noise-free  image,  along  with 
an  enlarged  portion.  Pixel  intensity  values  are  in  the  range 
[30,  224].  Fig.  15(b)  shows  the  image  corrupted  by  white 
noise  uniformly  distributed  on  the  interval  [-20,20].  The 
SNR  for  this  image  is  12.42  dB,  or  13.6  dB  peak-to-peak.  To 
compute  a  noise-reduced  image,  we  constructed  two  levels  of 
a  steerable  pyramid  from  the  noisy  image,  and  modified  the 
coefficients  as  previously  described,  using  a  sharpness  pa¬ 
rameter,  S  =  0.5,  and  a  threshold,  T  =  2  for  the  lower 


frequency  band  and  T  =  11  for  the  higher  frequency  band. 
The  same  parameters  were  used  for  all  orientations.  These 
values  were  roughly  chosen  to  minimize  the  mean  squared 
error.  The  pyramid  transform  was  then  inverted  to  produce 
the  processed  image,  shown  in  Fig.  15(c).  Very  little  noise 
remains  in  the  processed  image,  yet  most  important  image 
features  are  preserved.  The  SNR  of  this  result  is  23.0  dB. 

For  comparison  with  the  steerable  pyramid  algorithm,  we 
also  restored  the  image  using  a  Wiener  filter.  Implementation 
of  the  Wiener  filter  requires  estimates  of  the  power  spectra  of 
both  the  image  and  the  noise.  For  the  image  spectrum 
estimate  we  used  the  average  of  the  power  spectra  of  an 
ensemble  of  eight  different  face  images,  taken  under  condi¬ 
tions  identical  to  those  used  for  Fig.  15(a).  We  used  a  flat 
power  spectrum  model  for  the  noise,  with  amplitude  equal  to 
the  mean  of  the  spectral  power  of  the  actual  noise.  Fig.  15(d) 
shows  the  Wiener  filter  restoration  result,  with  a  SNR  of 
19.24  dB.  While  this  value  is  only  moderately  lower  than  that 
of  the  steerable  pyramid  method,  the  visual  appearance  of  the 
restored  image  is  both  noisier  and  more  blurred  than  the 
steerable  pyramid  result. 

VII.  Conclusion 

The  recent  development  of  wavelet  transforms  has  gener¬ 
ated  enthusiasm  and  controversy  in  a  broad  range  of  disci¬ 
plines.  The  defining  property  of  these  transforms  is  that  the 
basis  functions  are  dilations  and  translations  of  a  common 
kernel  (in  two  dimensions,  we  add  rotations  to  this  list).  Two 
secondary  properties  of  typical  wavelet  transforms  are  that 
they  are  orthogonal,  and  they  may  be  implemented  using 
recursive  pyramid  structures.  Their  representational  and 
computational  structure  makes  them  suitable  for  efficient 
signal  and  image  coding.  We  have  argued,  however,  that  the 
aliasing  that  results  from  the  critical  sampling  constraint  (as 
required  by  orthogonality)  is  often  problematic  for  applica¬ 
tions  in  signal  or  image  analysis. 

In  the  spatial  domain,  one  would  like  an  image  representa¬ 
tion  to  treat  its  input  in  a  uniform  manner,  regardless  of  the 
relative  alignment  of  the  input  and  the  transform  sampling 
lattices.  No  subsampled  subband  transform  can  be  translation 
invariant,  in  the  strong  sense  that  the  transformation  operator 
commutes  with  the  translation  operator.  But  is  is  possible  to 
generate  transforms  in  which  the  information  (and  the  power) 
contained  within  a  given  subband  is  invariant  to  translations 
of  the  input  signal.  Such  transforms  must  generally  be  over¬ 
sampled;  the  Nyquist  criterion  specifies  the  necessary  sam¬ 
pling  rate. 

We  have  developed  analogous  concepts  in  the  context  of 
orientation  analysis.  It  is  possible  to  devise  “steerable” 
image  representations  in  which  the  power  in  the  coefficients 
corresponding  to  different  orientations  (at  the  same  scale  and 
position)  is  invariant  to  rotations  of  the  input  signal.  The 
concept  of  steerability  has  been  explored  in  prior  work;  here 
we  have  demonstrated  that  is  is  essentially  equivalent  to  the 
translation  invariance  issue  of  the  spatial  domain. 

We  also  discussed  transforms  that  are  “scalable,”  in  that 
measurements  at  any  scale  can  be  derived  from  those  at  a 
discrete  set  of  scales.  The  scalable  representations  require 


606 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  38,  NO.  2,  MARCH  1992 


that  we  take  our  space-domain  concepts  and  re-state  them  in 
the  log  frequency  domain. 

We  have  formalized  a  generalization  of  these  properties 
which  we  call  “shiftability”.  We  showed  that  shiftability  is 
equivalent  to  the  invariance  of  the  transform  power  to  input 
signal  translation  (Proposition  2),  and  that  a  sub-property  of 
shiftability  is  that  the  sum  of  squares  of  the  basis  functions  is 
constant  (Corollary  1).  We  discussed  the  property  of  self- 
invertibility  and  showed  that  this  property  holds  for  trans¬ 
forms  that  are  shiftable  in  space  and  have  flat  basis  function 
power  in  the  frequency  domain  (proposition  4).  We  also 
discussed  the  issue  of  joint  shiftability. 

We  have  demonstrated  these  ideas  by  designing  one  trans¬ 
form  that  is  jointly  shiftable  in  space  and  orientation  (a 
“steerable  pyramid”),  and  another  that  is  jointly  shiftable  in 
space  and  scale.  Although  these  transforms  are  less  efficient 
that  critically  sampled  representations,  we  feel  that  for  many 
applications,  the  benefits  of  shiftability  are  worth  the  cost. 
We  demonstrate  advantages  of  the  transforms  by  applying 
them  to  the  problems  of  stereo  matching,  scale-space  analy¬ 
sis,  and  image  enhancement. 

There  are  open  issues  remaining.  The  relationship  of  the 
shiftability  property  to  the  mathematics  of  groups  could  be 
explored.  A  measure  of  joint  shiftability  might  be  used  in 
place  of  Gabor’s  joint  localization  constraint  to  develop  ap¬ 
proximately  jointly  shiftable  transforms  for  a  given  sampling 
structure.  The  practical  problems  in  the  design  of  scalable 
transforms  should  be  investigated  further.  In  particular,  it  is 
of  interest  to  construct  a  scalable  transform  that  is  efficiently 
implemented  as  a  recursive  pyramid.  Finally,  the  shiftability 
concept  could  be  extended  to  other  domains  such  as  time  and 
color. 
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